Skip to content

[diskann-vector] Support truly unaligned distances.#981

Open
hildebrandmw wants to merge 6 commits intomainfrom
mhildebr/super-unaligned
Open

[diskann-vector] Support truly unaligned distances.#981
hildebrandmw wants to merge 6 commits intomainfrom
mhildebr/super-unaligned

Conversation

@hildebrandmw
Copy link
Copy Markdown
Contributor

@hildebrandmw hildebrandmw commented Apr 28, 2026

An internal user has a case where full-precision vectors (e.g. f32) are stored in completely unaligned buffers (e.g. align of 1), requiring a data copy to align the data before the slices can be safely constructed. However, our distance function implementations use SIMDVector::load_unaligned under the hood, which are compatible with under-aligned pointers.

This PR exposes a proper API to the DistanceProvider trait (via the Distance type) for invoking the SIMD implementations with unaligned pointers.

Suggested Reviewing Order

  • diskann-wide: The implementations of SIMDVector::load* and SIMDVector::store* already support underaligned pointers. This PR updates the documentation and restructures the load/store tests to verify this property (we were already using this property in some of the quantized distance kernels). The new load/store tests successfully pass Miri.

  • unaligned.rs - a new UnalignedSlice is added for unaligned slices. This is just a pointer + length pair with some validity requirements but no alignment requirement. Conversions from &[T] and &[T; N] are added and the trait AsUnaligned replaces the use of AsRef<[T]> and the internal ToSlice traits.

    A test-only Buffer is used to purposely offset simple types to exercise the unaligned cases.

  • distance/simd.rs: The simd_op kernel is tweaked to accept AsUnaligned instead of AsRef. Checks have been added to the existing tests to ensure that the under-unaligned versions are both Miri compatible and yield the exact same results as their properly aligned counterparts.

  • distance/implementation.rs: The architecture hooks and specialization are changed to use AsUnaligned. I've investigated the code generation and the checks for impl FTarget<...> for Specialize<N, F> are sufficient to trigger constant propagation and the full unrolling of small fixed-sized kernels.

  • distance/distance_provider.rs: The Distance type is changed to pass UnalignedSlices across the function pointer boundary rather than raw slices. We can keep the existing API for slices trivially via AsUnaligned.

Code Generation

Unfortunately, the order in which functions are code-generated seems to have changed with this PR. That said, the fixed-sized specializations I have spot-checked result in identical assembly with this PR as with main, which is to be expected.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds first-class support in diskann-vector for computing SIMD-accelerated distances over truly under-aligned vector buffers (e.g., alignment 1), avoiding the need to copy data just to form &[T].

Changes:

  • Introduces UnalignedSlice + AsUnaligned and re-exports them from the crate root.
  • Updates SIMD distance kernels and specialization/dispatch plumbing to accept AsUnaligned inputs.
  • Extends Distance with call_unaligned and adds tests that exercise intentionally misaligned buffers.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
diskann-vector/src/unaligned.rs Adds UnalignedSlice, AsUnaligned, and a test-only Buffer to create intentionally misaligned data.
diskann-vector/src/lib.rs Exposes the new unaligned APIs from the crate root.
diskann-vector/src/test_util.rs Refactors test harness to accept a &mut dyn DistanceChecker (trait object).
diskann-vector/src/distance/simd.rs Changes simd_op to accept AsUnaligned and adds tests validating unaligned correctness/Miri safety.
diskann-vector/src/distance/implementations.rs Updates architecture hooks and fixed-size specialization to operate on AsUnaligned / UnalignedSlice.
diskann-vector/src/distance/distance_provider.rs Switches dispatched function signature to UnalignedSlice and adds Distance::call_unaligned.
diskann-vector/Cargo.toml Adds bytemuck (dev) and enables half/bytemuck for tests.
diskann-providers/src/model/pq/distance/multi.rs Adjusts reference distance calls to pass slices via explicit deref (&*...).
Cargo.lock Records the new bytemuck dependency resolution.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread diskann-vector/src/distance/implementations.rs Outdated
Comment thread diskann-vector/src/distance/simd.rs
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 28, 2026

Codecov Report

❌ Patch coverage is 90.33816% with 20 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.51%. Comparing base (be804aa) to head (0a4dbcb).

Files with missing lines Patch % Lines
diskann-vector/src/distance/simd.rs 85.89% 11 Missing ⚠️
diskann-vector/src/unaligned.rs 90.90% 6 Missing ⚠️
diskann-vector/src/distance/distance_provider.rs 75.00% 3 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #981      +/-   ##
==========================================
- Coverage   90.63%   89.51%   -1.13%     
==========================================
  Files         460      461       +1     
  Lines       85424    85549     +125     
==========================================
- Hits        77427    76576     -851     
- Misses       7997     8973     +976     
Flag Coverage Δ
miri 89.51% <90.33%> (-1.13%) ⬇️
unittests 89.35% <90.33%> (-1.25%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
diskann-providers/src/model/pq/distance/multi.rs 96.11% <100.00%> (ø)
diskann-vector/src/distance/implementations.rs 96.81% <100.00%> (+0.87%) ⬆️
diskann-vector/src/lib.rs 44.44% <ø> (ø)
diskann-vector/src/test_util.rs 100.00% <100.00%> (ø)
diskann-vector/src/distance/distance_provider.rs 99.29% <75.00%> (-0.71%) ⬇️
diskann-vector/src/unaligned.rs 90.90% <90.90%> (ø)
diskann-vector/src/distance/simd.rs 77.35% <85.89%> (-12.43%) ⬇️

... and 39 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 9 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread diskann-vector/src/distance/implementations.rs
Comment thread diskann-vector/src/unaligned.rs
Copy link
Copy Markdown
Contributor

@arkrishn94 arkrishn94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good Mark, mostly had small comments only. The only callout is the question on the indirection through AsUnaligned. Would love to understand why this is needed.

Comment thread diskann-vector/src/unaligned.rs
Comment thread diskann-vector/src/distance/distance_provider.rs
Comment thread diskann-vector/src/distance/implementations.rs
Comment thread diskann-vector/Cargo.toml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants